Search CORE

1,180 research outputs found

Consistency of cross validation for comparing regression procedures

Author: Yang Yuhong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/03/2008
Field of study

Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.Comment: Published in at http://dx.doi.org/10.1214/009053607000000514 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Maximum L $q$ -likelihood estimation

Author: Ferrari Davide
Yang Yuhong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In this paper, the maximum L

q

-likelihood estimator (ML

q

E), a new parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35] is introduced. The properties of the ML

q

E are studied via asymptotic analysis and computer simulations. The behavior of the ML

q

E is characterized by the degree of distortion

q

applied to the assumed model. When

q

is properly chosen for small and moderate sample sizes, the ML

q

E can successfully trade bias for precision, resulting in a substantial reduction of the mean squared error. When the sample size is large and

q

tends to 1, a necessary and sufficient condition to ensure a proper asymptotic normality and efficiency of ML

q

E is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Sparsity Oriented Importance Learning for High-dimensional Linear Regression

Author: Yang Yi
Yang Yuhong
Ye Chenglong
Publication venue
Publication date: 01/08/2016
Field of study

With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures

arXiv.org e-Print Archive

FigShare

Forecast Combination Under Heavy-Tailed Errors

Author: Cheng Gang
Wang Sicong
Yang Yuhong
Publication venue
Publication date: 26/08/2015
Field of study

Forecast combination has been proven to be a very important technique to obtain accurate predictions. In many applications, forecast errors exhibit heavy tail behaviors for various reasons. Unfortunately, to our knowledge, little has been done to deal with forecast combination for such situations. The familiar forecast combination methods such as simple average, least squares regression, or those based on variance-covariance of the forecasts, may perform very poorly. In this paper, we propose two nonparametric forecast combination methods to address the problem. One is specially proposed for the situations that the forecast errors are strongly believed to have heavy tails that can be modeled by a scaled Student's t-distribution; the other is designed for relatively more general situations when there is a lack of strong or consistent evidence on the tail behaviors of the forecast errors due to shortage of data and/or evolving data generating process. Adaptive risk bounds of both methods are developed. Simulations and a real example show superior performance of the new methods

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals